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PRECONDITIONED EIGENSOLVERS FOR LARGE-SCALE NONLINEAR 
HERMITIAN EIGENPROBLEMS WITH VARIATIONAL 
CHARACTERIZATIONS. H. INTERIOR EIGENVALUES* 

DANIEL B. SZYLDt, EUGENE VECHARYNSKI*, AND FEI XUE§ 


Abstract. We consider the solution of large-scale nonlinear algebraic Hermitian eigenproblems of the 
form T{X)v = 0 that admit a variational characterization of eigenvalues. These problems arise in a variety of 
applications and are generalizations of linear Hermitian eigenproblems Av = XBv. In this paper, we propose 
a Preconditioned Locally Minimal Residual (PLMR) method for efficiently computing interior eigenvalues of 
problems of this type. We discuss the development of search subspaces, preconditioning, and eigenpair extrac¬ 
tion procedure based on the refined Rayleigh-Ritz projection. Extension to the block methods is presented, 
and a moving-window style soft defiation is described. Numerical experiments demonstrate that PLMR meth¬ 
ods provide a rapid and robust convergence towards interior eigenvalues. The approach is also shown to 
be efficient and reliable for computing a large number of extreme eigenvalues, dramatically outperforming 
standard preconditioned conjugate gradient methods. 

AIMS subject classifications. 65F15, 65F50, 15A18, 15A22. 

1. Introduction. Nonlinear Hermitian algebraic eigenproblems of the form T{X)v = 0 
arise naturally in a variety of scientific and engineering applications. Many of these problems 
allow for a variational characterization (min-max principle) of some eigenvalues on certain 
intervals. Desirable properties of these eigenvalues and associated eigenvectors can be derived, 
and special methods can be developed to compute them efficiently. In Part I of this study p2] , 
we investigated Preconditioned Conjugate Gradient (PCG) methods for computing extreme 
eigenvalues of the nonlinear Hermitian eigenproblem that satisfy a variational principle. In 
this paper, to continue our study, we develop and explore Preconditioned Locally Minimal 
Residual (PLMR) methods for computing interior eigenvalues. Our exploration was motivated 
by a new class of preconditioned eigensolvers for linear eigenproblems [33], |35j . 

Interior eigenvalues are intrinsically more difficult to compute than extreme eigenvalues. 
This is the case even for linear Hermitian problems Av = XBv. To devise an efficient and 
reliable interior eigenvalue solver, several issues need to be addressed. First, a good precondi¬ 
tioner M approximating A — aB must be available, where cr is a real shift close to the desired 
eigenvalues. Second, appropriate variants of subspace projection and eigenpair extraction 
should be used to provide a rapid and robust convergence towards interior eigenvalues. In 
particular, the extraction should identify and discard spurious Ritz values. Exactly the same 
types of challenges arise when solving nonlinear interior eigenproblems. 

For the first issue, we note that efficient and robust preconditioners for interior eigenvalue 
computations are typically constructed with indefinite matrices. Their development is rather 
challenging in general, and is out of the scope of this paper. Nevertheless, several options 
are readily available, e.g., the incomplete LDL"”" factorization [T^, [55]) the absolute value 
preconditioning |34] . or any iterative solver for a corresponding indefinite linear system. Here, 
we assume that a suitable preconditioner is at hand, and focus on the development of search 
subspaces and mechanisms for extracting approximate eigenpairs. 
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The search subspaces suggested within the proposed PLMR method are given by certain 
preconditioned Krylov subspaces that are augmented with a search direction connecting the 
eigenvector approximations obtained in two consecutive iterations. Therefore, they can be 
viewed as a natural generalization of the PCG search subspaces [32]. The main difference 
with PCG is that the PLMR search subspace is based on an augmented Krylov subspace of 
a larger dimension, which enhances the robustness of convergence. Nevertheless, if a good 
preconditioner is available, the PLMR subspace does not have to be significantly larger than 
that of PCG. Therefore, similar to PCG, the eigenpair approximations in PLMR are normally 
extracted from subspaces of a small size. 

In order to extract interior eigenpairs of linear eigenproblems, the harmonic Rayleigh- 
Ritz pocedure |21j . |28j is widely used. The same idea can be applied in the nonlinear case, 
which leads to the projected nonlinear eigenproblem U*T{a)*T{i')Uy = 0, where U contains 
basis vectors of the search subspace. Then, the harmonic Ritz pairs {v, Uy) with u close to a 
provide approximations to the desired interior eigenpairs of the original problem. 

The main disadvantage of the harmonic Rayleigh-Ritz projection is that it does not 
preserve symmetry, i.e., the projected problem U*T(a)*T{v)Uy = 0 is no longer Hermitian. 
The loss of symmetry is unlikely to be a major issue for linear problems. However, in the 
nonlinear case, solving projected eigenproblems that do not preserve the original structure 
could cause considerable complications. In particular, it may require special treatment of 
invariant pairs |7|, and is likely to incur significant loss of accuracy in the final solutions. 

We avoid this issue by using the standard Rayleigh-Ritz procedure. The resulting pro¬ 
jected eigenproblem U*T{v)Uy = 0 is also Hermitian, with eigenvalues satisfying the vari¬ 
ational principle. To remedy the slow convergence towards interior eigenvalues, which is 
commonly intrinsic to the standard Rayleigh-Ritz approach, we propose a simple strategy 
for discarding spurious Ritz values, followed by an eigenvector refinement procedure (see [12] 
for linear eigenproblems) that stabilizes and accelerates the convergence. Our experience 
shows that such a refined projection outperforms the harmonic Rayleigh-Ritz approach, and 
is crucial for maintaining robust convergence. We observe that the effects of the eigenvector 
refinement are significantly more pronounced in the nonlinear setting. 

To understand the local convergence of the new method, we shall discuss a close connec¬ 
tion between the PLMR search subspace and that of the basic Jacobi-Davidson (JD) method 
using the right-preconditioned GMRES as a solver for the correction equation. This con¬ 
nection allows derivation of the order of local convergence of PLMR, established under an 
assumption on the approximation property of the refined Rayleigh-Ritz procedure. Our anal¬ 
ysis shows that PLMR with a search subspace of a fixed size converges linearly, and it exhibits 
a higher order of convergence if the search subspace is expanded with every new iteration. 

For the case where several eigenpairs are wanted, we present a block variant of the PLMR 
method, called BPLMR. To the best of our knowledge, BPLMR is the first block variant of a 
preconditioned eigensolver for computing interior eigenvalues of nonlinear eigenproblems. In 
this algorithm, a special care is taken to ensure the robustness of the eigenvector refinement 
procedure, which is enhanced to avoid repeated convergence of semi-simple and clustered 
eigenvalues. Moreover, special attention is devoted to computing a large number of successive 
eigenvalues, for which a moving-window-style soft deflation strategy is described. 

The proposed PLMR methods use several well-established techniques that contribute to 
fast and robust convergence towards interior eigenvalues. They share similarities with the 
nonlinear Arnoldi method |36| as both are “preconditioned eigensolvers” based on projections 
onto the Krylov-like search subspaces constructed by a preconditioned linear operator. They 
also possess features of the nonlinear Jacobi-Davidson method 0, |35| in the use of stabilized 
preconditioners. Consequently, we expect that PLMR performs at least as well as nonlinear 
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Arnoldi and JD. In addition, the suggested eigenvector refinement procedure further improves 
the convergence, especially if the preconditioner is not very strong, or if clustered or semi¬ 
simple eigenvalues are desired. 

The paper is organized as follows. Section 2 reviews basics of nonlinear Hermitian eigen- 
problems, including a nonlinear variational principle. In Section 3, we propose a basic PLMR 
method for computing one interior eigenvalue of T(X)v = 0 around a given shift. Section 4 
provides an insight into the connection between the search subspaces of PLMR and of the 
right-preconditioned GMRES used as an inner solver for a basic JD method, leading to a 
local convergence result for PLMR. In Section 5, we develop BPLMR for computing several 
interior eigenvalues simultaneously. Numerical results, which demonstrate the efficiency of 
PLMR methods, are presented in Section 6. Our conclusions can be found in Section 7. 

2. Nonlinear Hermitian eigenproblem and variational principle. In this section, 
we describe nonlinear algebraic Hermitian eigenproblems T{X)v = 0 that admit a variational 
characterization on an open interval J. Here, T(-) : J C K —> C"^" maps a real scalar /i S J 
continuously to the Hermitian matrix The scalar A S J, for which T(A) is singular, is 

an eigenvalue of T(-) with a corresponding eigenvector v S nullT(A) \ {0}. 

To simplify our analysis, we assume, as in Part I |32j . that T{-) does not have infinite 
eigenvalues on J. This assumption is valid for most Hermitian eigenproblems encountered 
in practice. Under the assumption, J = (a, b) containing all eigenvalues of interest is finite, 
where a and b are not eigenvalues of T(-). In certain circumstances, T{-) does have infinite 
eigenvalues (for instance, linear Hermitian eigenproblems Av = XBv with a semi-definite B), 
but those eigenvalues generally have little physical relevance and thus are rarely desired. 

We start the description with several definitions. 

Definition 2.1. The Rayleigh functional p{-) : D ^ J is a continuous mapping of a 
vector X G D C C" \ {0} to the unique solution p{x) G J of the equation x*T{p(x))x = 0. 

Definition 2.2. Given T{-) : J C M —>■ C"^", J C K is called an interval of positive or 
negative type, if (p — p{x)){x*T{p)x) is constantly positive or constantly negative, respectively, 
for all X G D and all p G J, p ^ p{x). Both positive and negative type are definite type. 

Definition 2.3. A real scalar X is the fc-th eigenvalue ofT{-) if zero is the k-th largest 
eigenvalue of the matrix T[X). Unless noted otherwise, the k-th eigenvalue is denoted as Xk- 

Necessary and sufficient conditions for J to be of definite type are given as follows. 

Proposition 2.4 (Proposition 2.4 in [32]). Let J = {a,b) C K be finite, wherea,b are not 
eigenvalues of T{-), and let p : D ^ J be the Rayleigh functional, where D = C” \ {0}. Then 
J is an interval of positive (negative) type if and only ifT(a) is negative (positive) definite 
andT(b) is positive (negative) definite. Assume thatT(-) is continuously differentiable. Then 
J is of positive (negative) type if x*T'(p(x))x > 0 (< 0) for all x G D. If, in addition, T(-) 
is twice continuously differentiable and x*T"(p(x))x 0 for all x G D, then J is of positive 
(negative) type if and only if x*T'(p(x))x > 0 (< 0) for all x G D. 

On an interval of definite type, we have a variational characterization of eigenvalues of 
T(-) and the orthogonality of eigenvectors [TT][55]|in]. 

Theorem 2.5 (Nonlinear Variational Principle). Let J cM. be finite and of definite type, 
and T(-) be continuously differentiable on J. Then there exist exactly n eigenvalues {Xk}^^i 
ofT(-) on J that satisfy a variational principle. Specifically, if J is of positive type, then 

Xk = min{max{p(a;) \ xGS,xf^t)\ \ dim(S') = fc} and (2-1) 

Xk = max{min{p(x) \ xGS,xf^0}\ dim(5') = n — fc J-1}; 
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if J is of negative type, then 

Xk = max{min{p(a;) | x S 5, a: 7 ^ 0} | dim(S') = k} and (2-2) 

Afe = min{max{/ 9 (x) | x S S', x 7 ^ 0} | dim(S) = n — k + 1}. 

Moreover, there exist n corresponding eigenvectors {vk}k=i that form a basis ofC^, and they 
are orthogonal with respect to the scalar-valued function [•, •] defined as 

„] _ / y*iT{p{y)) - T{p{x)))x/ {p{y) - p{x)) ifp{x) ^ p{y) . , 

\-^'y^-\y*r{p{x))x ifp{x)=p{y) ■ 


A natural corollary of the nonlinear variational principle (2.1 1 or (2.2) is the nonlinear 
Cauchy interlacing theorem. 

Theorem 2.6 (Nonlinear Cauchy interlacing theorem [32]). Let J = {a,h) he finite and 
of definite type, T{-) be continuously differentiable on J, and U € contain m linearly 

independent column vectors. Then the projected eigenproblem U*T{v)Uy = 0 has exactly 
m eigenpairs {{v'j,yj)}JLi satisfying the nonlinear variational principle (2.1) or (2.2). In 
addition, if J is of positive type, then Xj < Vj < Xn-m+j', if J is of negative type, then 
Xn-m+3 < V] < Xj (1 < j < m). 


From Definition 2.3 and the nonlinear variational principle (2.1) or (2.2), we see that 
the eigenvalues of the nonlinear Hermitian eigenproblem on an interval of definite type can 
be ordered in the same manner as for linear Hermitian eigenproblems. In particular, this 
ordering is needed when PLMR is used for computing many successive extreme eigenvalues, 
as described in Section 5.5. 


3. The single-vector PLMR. In this section, we present a basic version of the PLMR 
method for computing an interior eigenvalue and the associated eigenvector of the nonlinear 
Hermitian eigenproblem. We discuss the main building blocks of the method, including devel¬ 
opment of the search subspace, preconditioning, and extraction of the approximate eigenpair. 

3.1. Development of the search subspace. Assume that A is a unique distinct eigen¬ 
value of r(-) closest to tr S J, and let u be a corresponding eigenvector. Suppose that in the 
A:-th iteration we have an approximate eigenvector Xk- Our goal here is to develop a search 
subspace lAk, from which a more accurate eigenvector approximation can be extracted. 

We start by reviewing the search subspace constructed within a variant of the PCG 
method, the Locally Optimal Preconditioned Conjugate Gradient (LOPCG) algorithm |32) . 
for computing the lowest eigenvalue of r(-). Given the current eigenvector approximation Xk, 
LOPCG defines the search subspace as 

^LOPCG ^ span{xfc,M"^Vp(xfe),pfc_i}, 


where 

is the gradient of the Rayleigh functional p(-) at Xk (see |32l Proposition 3.1]), M « T{a) is a 
preconditioner, and Pk-i is the search direction connecting Xk-i and Xk (p-i = 0). Note that 
T {p{xk)) Xk defines the residual of the eigenproblem, which is parallel to the gradient Vp(xfc). 

For the sake of simplicity, we let pk = p{xk) when there is no danger of confusion. The 
above search subspace can then be written as 

^LOPCG ^ [M~^T{pk),Xk) + span{pfc_i}, 


(3.1) 
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where b) = span {6, Ab,..., A™~^b} denotes an m-dimensional Krylov subspace [25]. 

This three-dimensional search subspace, unfortunately, is not effective for computing interior 
eigevalues. The main issue is that the convergence towards these eigenvalues can be fairly 
slow, and thus a search subspace of a larger dimension is needed to stabilize and accelerate 
the convergence. This is especially evident if the preconditioner M is not very strong. 


In order to properly enlarge the LOPCG subspace (3.11, we consider the subspace 


Uk=JCm+iiM ^T(pfc),a;fc)-I-span{pfc_i}, 


(3.2) 


where JCm+i{M~^T{pk),Xk) generalizes the limit space JCm+i {M~^{A — pkB),Xk) of the 
Generalized Davidson method that restarts every m steps |23|. The augmentation of this 
Krylov subspace with the search direction Pk-i is expected to accelerate the convergence as 
it does for PCG methods. 


3.2. Stabilization of preconditioning. A drawback of the search subspace (3.2) is 


that it can potentially suffer from numerical instabilities. To see this, let M — T{a) and 
Pk = cr. Then M~^T{pk) = I, and the search subspace degenerates to span{xk,Pk-i}- 
Therefore, an algorithm based on the search subspace (3.2) stagnates, i.e., cannot generate 


any improvement in eigenvector approximation. In practice, stagnation could arise whenever 
M is a good approximation to T{a) and pk is sufficiently close to a. The same issue is 
known for the Davidson type methods for linear eigenproblems, which has been fixed by the 
Jacobi-Davidson (JD) algorithm [5|, |55|. 

A key ingredient contributing to the robustness of the JD methods is the modification 
of the preconditioning procedure in such a way that it is performed through solution of a 
correction equation rather than a direct application of M~^. In particular, for nonlinear 
eigenproblems, the correction equation of the basic JD methocQis of the form 


IliT{pk)Tl2Axk = -T{pk)xk, 


(3.3) 


where Hi and 112 are properly chosen projectors, such that T(pk)xk G range(ni) and n 2 Aa;fc = 
Axk] see, e.g., 0, m Chapter 6.2], [30]. For Hermitian T(-), one can choose 


rr _ T T'{Pk)XkX*k 

^ ~ xlT'{pk)xk “ 


U2=Ul=I- 


XkxlT'{pk) 
x*kT'{pk)xk' 


(3.4) 


In this case, the coefficient matrix in (3.3) is Hermitian, and thus efficient preconditioned 
linear solvers such as MINRES |21| or SQMR m can be applied. It can be shown that the 


exact solution Axk of (3.3) with the projectors defined in (3.4) satisfies 

x*kT'{pk)xk 


Xk T Axk — 


x*kT'{pk)T{pk) ^T'{pk)xk 


T{pk) T'{pk)xk, 


which is parallel to the new iterate Xk+i = T(pk) ^T'{pk)xk obtained from the Rayleigh 
functional iteration; see [271 Ghapter 4.3], [20] . 


Motivated by the structure of the coefficient matrix in the correction equation (3.3), we 
modify the preconditioner M « T((t) by multiplying it with the projectors in (3.4). This 
replaces M with a stabilized preconditioner 


Mn=nMn*, (3.5) 


^The basic variant of JD forms the new approximation as Xk+i = Xk +Axk, where Axj. is an approximate 
solution to the correction equation; no subspace expansion and projection is involved. It is also referred to as 
single-vector JD or simplified JD in literature; see, e.g., m, m- 
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where 11 = Hi, as defined in (3.41. Thus, the preconditioner is now applied to a vector through 




rather than M 


The precise formula that describes the action of on a given vector can be derived as 
follows. We consider vectors y and b such that b = M^y = IlMIl*y, where b G range(n), and 
y T T'(pk)xk^ i.e., Ii*y = y. These assumptions are standard in the preconditioning for JD, 
identical to those used in [30], jUj- Equivalently, 


b = 


fj T'{pk)xkxl \ 
V xlT'{pk)xk) 


My 


My - T'{pk)xk~^ 


xjMy 

T'{pk)xk 


and it follows that 


y 


= M-^h + 


M-^r{pk)xk 


xjMy 

xlT'{pk)xk 


(3.6) 


The orthogonality y T T'(pk)xk implies that 


0 = xlT'{pk)y 


xlT\pk)M-^b + xlT'{pk)M-^T\pk)xk 


xjMy 

xlT'{pu)xk 


from which we have 


xlMy 

xlT'{pk)xk 


xlT\pk)M-^b 
' xlT'{pk)M-^T'{pk)xk 


(3.7) 


Thus, after substituting (3.7) into (3.6), we obtain 


y = Ml^b = M-H - ^b M-^T'{pk)xk 


= /- 


xlT'{pk)M-^T'{pk)xk 
M-^r{pk)xkXir{pk) \ 
xlT'{pk)M-^T'{pk)xk j 


M-^b. 


(3.8) 


In contrast to M~^, the operator M^ does not cancel out with the matrix T{pk) and, 
instead, applies M~^ to T'{pk)xk, magnifying the desired eigenvector component. Note that 
M~^T'{pk)xk in (3.8) can be computed only once and further used to evaluate vectors of the 


form M^T{pk)z, for any T{pk)z G range(n). This observation is needed for the construction 
of the PLMR search subspace. 


Finally, given a stabilized preconditioner (3.5), whose action on a vector is expressed 


in (3.8), we define the PLMR search subspace as 


=lCm+l{MlT{pk),Xk) 


span{pfc_i}. 


(3.9) 


which is exactly (3.2) with M replaced by Mn. Our numerical experience confirms that 


the PLMR version built upon (3.9) indeed tends to be significantly more robust than that 
based on (3.2). Therefore, throughout, we only use (3.9), constructed with the stabilized 


preconditioner, as the search subspace for the PLMR algorithm. 


3.3. Subspace projection and extraction. Given the PLMR search subspace (3.9), 


we now consider the projection of the original eigenproblem onto this subspace, and describe 
the extraction of a new eigenvector approximation. 

Similar to the linear setting, the standard Rayleigh-Ritz procedure is ideal in preserving 
symmetry and is most suitable for computing extreme eigenvalues, but it generally exhibits 
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very slow convergence towards interior eigenvalues. As a remedy, the harmonic Rayleigh-Ritz 
scheme could be used. However, the main disadvantage of this approach is that it does not 
preserve symmetry of the original eigenproblem. In general, algorithms for solving interior 
eigenvalues of nonlinear eigenproblems without symmetry are significantly more complicated 
and tend to be less robust than those for solving nonlinear Hermitian eigenproblems admitting 
a variational principle; see, e.g., [41] |42j and references therein. In addition, solving nonlinear 
projected eigenproblems that fail to preserve the structure (non-Hermitian in our case) can 
lead to a significant loss of accuracy in the final eigenpair approximations. 

To resolve this issue, we propose using the standard Rayleigh-Ritz projection, followed 
by a procedure to detect and discard spurious Ritz values, and a refinement step to further 
improve the quality of eigenvector approximation. Note that spurious Ritz values are Ritz 
values close to the desired shift cr, but they correspond to poor eigenvector approximations, 
typically given by linear combinations of eigenvectors associated with eigenvalues outside the 
interval of interest. They often arise frequently when interior eigenvalues are sought. 

Specifically, let Uk G cnx("i-i- 2 ) orthonormal basis vectors of the search sub¬ 


space (3.9). The Rayleigh-Ritz scheme then leads to the projected Hermitian eigenproblem 


UtT{v)Uky = 0 , 


(3.10) 


which also admits a variational characterization of its eigenvalues (i.e., of the Ritz values) sat¬ 
isfying the nonlinear Cauchy interlacing theorem (Theorem |2.6[ ). The projected eigenproblem 
can be solved, e.g., by PCG methods based on the variational principle [5^ . 

After forming the projected problem ( 3.10[ ), the following approach is used to obtain an 
approximate eigenpair. First, we solve ( 3.10[ ) for the r successive Ritz values vi, ..., closest 
to cr and the associated eigenvectors j/i,... ,?/r, and compute the corresponding Ritz vectors 
Zi = UkUi (1 < i < r). We then order {vi} according to the residual norms of the respective 
Ritz pairs, such that for any i, j with 1 < i < j < r, 


\\T Zi\\^ ^ \\T (i^j) Zj\\^ 


(3.11) 


Next, we take s Ritz pairs {ui, Zi) of minimal eigenresidual norm from the r candidates, and 
choose the Ritz value, say (1 < £ < s), that is closest to cr. 

The motivation of the above approach is well founded. We first find a relatively large 
number, r, of Ritz values near a, so that a good eigenvalue approximation is included in this 
set. Other Ritz values, not selected, are relatively far from cr, and cannot represent accurate 
approximations to the desired eigenvalue. The ordering of Ritz pairs in terms of eigenresidual 
norm tends to put promising Ritz pairs to the front of the ordered set and others to the end. 
This step aims to filter out spurious Ritz pairs, i.e., those with Ritz values close to cr but 
with large eigenresidual norms. Such pairs commonly arise in the Rayleigh-Ritz projection 
for computing interior eigenvalues, and are excluded from further consideration due to the 
proposed ordering. As a result, we have a fairly good chance that a promising Ritz pair 
is included in the set of s candidates with minimal eigenresidual norm. Finally, the Ritz 
value iy£ closest to cr is selected from the s candidates. In our implementation, by default, 
r = min(m -|- l,max(5, \{m -|- l)/2])), and s = 2. 

By construction, the selected interior Ritz pair {vi,Z£) has a reasonably small eigenresid¬ 
ual. However, in most cases, it can be further significantly improved. To this end, we refine the 
Ritz vector by substituting it with a new eigenvector approximation that delivers a minimal 
residual. That is, we solve 


VMR = argmin||j^||^i \\T{v£)Uky\\2 > 


(3.12) 










Algorithm 1: The PLMR(m) algorithm for a Hermitian eigenproblem T{X)v = 0 

Input: An initial eigenvector approximation xo € C” \ {0}, a preconditioner M, 

a shift (T £ R, and integers r, s > 0; 

Output: An eigenpair (A,w), where A is the eigenvalue of T(-) closest to a; 


1 : 

2 : 

3: 

4: 

5: 

6 : 


7: 

8 : 

9: 

10 : 

11 : 

12 : 


Set k 0 and p_i []. Compute po = p(a:o)- 
while convergence not reached do 
If fe > 0, then pk-i ^ Xk — Xk-i- 


Compute an orthonormal basis Uk of the search subspace (3.91, where the action of 
the preconditioner Mn is given by (3.8l. 


Solve the Rayleigh-Ritz projected eigenproblem (3.101. 

Select the r Ritz values closest to a, and sort the corresponding Ritz pairs according 
to their eigenresidual norms (3.111. Then choose the s Ritz pairs with minimal 
eigenresidual, and use them to identify the Ritz value vt closest to cr. 

Compute the right singular vector umr associated with the smallest singular value 

of the matrix T{i/t)Uk- 

Set Xk+l t— UkVMR’, Pk+l t— p{Xk+l). 

Normalize Xk+i, such that xl^-^T'{pk+i)xk+i = 1. 
fc t— fc -I- 1. Check convergence of {pk,Xk)- 
end while 

Set X pk', V Xk- Return (A,u). 


and set the new iterate to Xk+i = UkUMR- The corresponding eigenvalue approximation is 
then given by the Rayleigh functional Pk+i evaluated at Xk+i- 

Problem ( 3.12[ ) can be approached by finding the smallest singular value of the matrix 
T{vi)Uk G and its right singular vector, or equivalently by solving the linear eigen¬ 

problem UkT{i'e)*T{i'i)Uky = yy for the eigenvector corresponding to the smallest eigenvalue. 
Note that the entire strategy described above is a direct generalization of the refinement pro¬ 
cedure of [12] to the case of nonlinear eigenproblems. As we shall see in Section]^ step (3.12) 
indeed turns out to be crucial for stabilizing the convergence of PLMR. The whole PLMR 
scheme is summarized in Algorithm 1. 

4. Local convergence analysis. In this section, we provide an analysis of the PLMR 
algorithm, explaining the conditions that guarantee its convergence and how rapidly it may 
converge. Our analysis is based on a close connection between the search subspaces developed 
by PLMR and the basic JD method, and on certain assumptions about the performance of 
the subspace projection and extraction. 

Specifically, let {pk,Xk) be the current approximation to the desired eigenpair (A, v), where 


A is the eigenvalue of T(-) closest to a. Recall from (3.3) the basic JD correction equation 


nT(pfc)n*Aa;fc = -T{pk)xk, 


and assume that the preconditioner (3.5) is used for a Krylov subspace method with right¬ 
preconditioning to solve this equation. In iteration to, the Krylov subspace developed for the 
preconditioned linear system is thus 

ICm (llT{pk)U*Ml,,Tipk)xk) , 

where 11 = / — T'‘(p^)xl ■ right-preconditioning, the approximate solution Axk 

of the original JD correction equation (3.3) lies in M^ICm (j^T{pk)Tl*M^,T{pk)xk^, and 



















9 


therefore the new approximation Xk+i = Xk + Axfc lies in 

spanjxfe} + (llT{pk)Il*T{pk)xk^ (4.1) 

= span ■|a;/c, MlT{pk)xk, Ml,UT{pk)Il*MlT{pk)xk, • ■ •, (nr(pfe)n*M^) ' T{pk)xk 
= span I Xk, MlT{pk)xk,[Ml,T{pk)y Xk ■ • ■ , \ — ^m+1 {Ml,T{pk) 5 


where we used the identity 


M^n = n*M^ = 


that can be derived from 13.4| and 13.8| without much difficulty. Clearly, the subspace (4.1) 


is a proper subspace of the PLMR(m) search subspace (3.9), an augmented version of (4.1). 
This observation is summarized in the following lemma. 


Lemma 4.1. Given the same current iterate Xk, basic JD with correction equation (3.3) 
delivers a new eigenvector approximation x^^^ lying in the search subspace where PLMR(m) 
extracts its new iterate 

Consequently, if is of the same quality as then the convergence of PLMR 

can be established as a corollary of the local convergence theorem of basic JD, already shown 
in our problem setting m Theorems 7, 11]. Whether the new iterates of the two methods are 
comparable in quality depends on the approximation properties of the refined Rayleigh-Ritz 
projection used in PLMR, which have been established for standard linear eigenproblems; see, 
e.g., [29l Chapter 4.4] and references therein. 

Let V be the desired eigenvector, and U contain basis vectors for the eigensolver search sub¬ 
space. Roughly speaking, under certain typically non-stringent conditions, Z(u, Uy), the angle 
between v and the corresponding Ritz or refined vector Uy, is proportional to Z(z;, range(t7)). 
For nonlinear eigenproblems T{X)v = 0, a complete study of similar properties of these tech¬ 
niques is beyond the scope of this paper. Nevertheless, in our numerical experiments, we find 
that Z{v,Uy) is also proportional to Z(u, range([/)) consistently. Therefore, we assume that 
this property holds, and give a major local convergence result of PLMR. 

Theorem 4.2. Let (A,u) be a simple eigenpair of the nonlinear Hermitian eigenproblem 
T{X)v = 0, where v is normalized such that v*T'{X)v = 1. Assume that there exist a S > 0 
and a corresponding ^ > 0, such that for any eigenpair approximation (/i, x) sufficiently close 

V 

X 

be the eigenvector approximation obtained in the k-th iteration of PLMR, where "fk, Ck and 
Sk are the generalized norm of Xk, generalized cosine and sine of Z{xk,v), respectively {see 


to {X,v), namely, with 


< S, we have ||T'(^)x|| < f. Let xj. = 'YkickV + 


Appendix). Suppose that Z(xo,v) is sufficiently small, such that 


\ Xo 


V 

[ p{xo) 


x\ 


< S. 


For each Xk, assume that the refined projection extracts a new eigenvector approximation 
Xk+i, such that sin Z(u, a:fe+i) < CsinZ{vfor a small constant C independent of k. 
Assume that the JD correction equa tion (3.3) is solved by right-preconditioned GMRES(mfc) 
with the preconditioner defined in (3.5). Let be a sufficiently small and fixed 


tolerance, and and be decreasing sequences of tolerances, where 

Cj 3 and C.y are sufficiently small constants independent of k. For each k, assume that mk 
is s uffic iently large, such that one cycle of GMRES(to/j) delivers an approximate solution 
of (3.3) satisfying the relative tolerance or respectively. Then PLMR(mfe) 


converges towards (A, v) at least linearly, quadratically or cubically, respectively. 



























10 


Proof. Given the above assumptions, it is shown in Theorems 7 and 11 in [30] that the 


basic JD method with approximate inner linear solves that satisfy the tolerances Tjf^\ 
and respectively, converges locally towards (A,n) linearly, quadratically and cubically, 

respectively. Note that basic JD generates the new approximation G as shown in 

(4.1), and therefore sinZ(v,xl^^) > sin By assumption, the refined projection 

of PLMR delivers the new eigenvector approximation satisfying sinZ(z;, < 

The convergence of 


CsinZ(ti,7/|’^'^^), and therefore sinZ(w,< CsinZ(z;,x 
PLMR thus follows directly from that of basic JD. □ 


JD 

fc+lh 


5. Block PLMR. In this section, we consider simultaneous computation of a few in¬ 
terior eigenvalues and their eigenvectors. To this end, we develop a block variant of PLMR, 
referred to as BPLMR. We shall see that most of the techniques used in PLMR can be ex¬ 
tended directly to the block case. We also discuss deflation techniques and describe their 
application to computing large numbers of successive eigenpairs. 


5.1. Search subspace and preconditioning. Assume that we want to find the q 
eigenvalues closest to cr, namely, Ai,..., M, such that |Ai — ct| < ..., < \\q — a\, together 
with the associated eigenvectors vi,..., ujj Let G be the block of 

eigenvector approximations at iteration fc, and let = diag(p^^\... denote a diagonal 
matrix of the Rayleigh functional values = p(a;[,*^). We define the block of eigenresiduals 


T(Xfe,$fc) = [T(pW)4'^ ■ • ■ 


and can hence construct a LOBPCG-type search subspace spanned by the columns of Xf^, 
T(Alfe,<i>fe), and Pk-i, where Pk-i carries information about the approximate eigenvectors in 
the previous step (P_i = 0); see Part I of this study [35]. Similar to the single-vector case, 
such a LOBPCG subspace can be further expanded to better accommodate approximations 
of the interior eigenpairs, leading to the BPLMR search subspace 

^BPLMR(m) ^ + range (Pfe_i), (5.1) 

where Km+i •, $fc), Xk^ is the block Krylov subspace generated by the starting block 

Xk and the linear operator Lfc(-) = Mj)jT( •, 4)^). That is, 

Xm+i ^Mj)jT( ■ ,^k),Xk^ = range{Al/j,Lfc(Affc),Lfc (Lfc(Alfc)),... ,L™(Alfe)}, 


where L"*)-) stands for the composition of L(-) with itself for m times, and range(Jf) denotes 
the column space of X. 

By analogy with (3.4) and (3.5), in (5.1), we introduce a stabilized preconditioner 


Mn = UMU*, 


(5.2) 


where 

n = /- Zk{xtZk)-^xi Zk = r{Xk. 4>fc) = [r'(p4)x4 • ■ • T\pi^^)xi^\ 


^To facilitate the description of interior eigenvalue computation, the numbering of eigenvalues here is 
different from the natural order defined in Definition 2.3. 
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The above projector 11 is a direct extension of the one defined in (3.4) to the block case. 
Similar to (3.8), the action of Mj^ on a block of vectors B G range (Mn) can be expressed as 


= {I- M-^Zk{ZlM-^Zk)~^Zl) M-^B. 


(5.3) 

Clearly, (5.1) represents a sum of q PLMR search subspaces ( |3.9| ) with starting vectors 
(1 < i < g) and the preconditioner Mn in (5.2), and is of dimension (m + 2)q in general. 

The block search direction Pk-i can have several possible formulations. One option is 
to define Pk-i = — Xk-i, which represents a direct generalization of the single-vector 

directions pk-i = Xk — Xk-i used is PLMR. An alternative formulation can be given by 


Pk-i =Xk- Xk-i {X*_,Xk-i) ' X*_,Xk 


(5.4) 


which is a residual of the least squares problem miufjgcgx, \\Xk — Xk-iG\\F- Hence, defini¬ 


tion (5.4) guarntees that Pk-i has the smallest norm columnwise for all blocks of the form 


Xk - Xk-iG, where G G 

In exact arithmetic, the two variants of Pk-i lead to the same search subspace, because 

^BPLMR(„^) ^ ,ange{Pfc_i} + ICm+i (m|jT( •, ^k).Xk) 

= range{Xfc_i} -P Km+i (mJjT( •, $fe), . 

In practice, BPLMR(m) working with either version of Pk-i converges equally rapidly in most 
cases, but formulation (5.4) tends to perform slightly better occasionally. We have no complete 


understanding of this, but have an intuitive explanation. The individual eigenvector approxi¬ 
mations in Xk are usually properly ordered, e.g., by the distances between the corresponding 
eigenvalue approximations and cr. As the algorithm proceeds, the ordering of some eigenvec¬ 
tor approximations could change due to the change of their eigenvalue approximations. When 
such a change occurs, Pk-i = Xk—Xk-i generates poor search directions that represent the 
difference between approximations to distinct eigenvectors in two consecutive iterations. By 
contrast, the least squares problem finds a matrix G = {X^_-^Xk-i)~^Xl_^Xk that ‘reorders’ 
the columns of Xk so that XkG aligns columnwise with Xk, and thus Pk-i = Xk— Xk-iG 
represents the the difference between the subspaces spanned by the two block iterates, and it 
is more likely to be numerically favorable. 

5.2. Subspace projection and extraction. The subspace projection and extraction 
for BPLMR also follow PLMR closely. In particular, let Uk G contain orthonormal 

basis vectors of (5.1). First, we use the standard Rayleigh-Ritz procedure to obtain the 
projected Hermitian eigenproblem U^T{i/)Uky = 0, whose (m -I- 2)q eigenvalues (the Ritz 
values) satisfy the nonlinear variational principle (Theorem |2.5[ ) and the nonlinear Cauchy 
interlacing theorem (Theorem |2.6[ ). Next, we find the r Ritz values vi,... ,Vr that are closest 
to a, and order them according to the eigenresidual norms of the corresponding Ritz pairs, 


so that (3.11) holds for any 1 < i < j < r. Then, given the r candidates, we choose s Ritz 
values Vi, and the associated Ritz vectors Zi = UkUi, that yield the smallest eigenresiduals. 
Finally, out of these s Ritz pairs, we select the q Ritz values vi^,..., vg^ that are closest to a, 
and further use them in the refinement procedure. 

As we have already explained, the motivation for this approach is to filter out spurious Ritz 
values by first including all promising approximations in a relatively large set of r Ritz pairs, 
and then abandoning those with largest eigenresidual norm to obtain a set of s candidates. 
This set is used to choose the q most promising Ritz values, i.e., those closest to a. By default, 
we let r = min ((m -I- 2)q, max(3q, [(m -I- 2)q/3 \)) and s = 2q. 
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Finally, given the Ritz values ..., we use them in the refinement step 
Vmr = argmin||j^||^i \\T{vt,;}Uky\\^ , 1 < f < g. 


(5.5) 


Solving q minimization problems (5.51 allows defining the block ... of 

(■i) (0 

new approximate eigenvectors, such that x\^-^ = UkUjJji- corresponding values of the 
Rayleigh factional are then placed on the diagonal of $/c+i. 


5.3. Refined projection for semi-simple and tightly clustered eigenvalues. The 

refined projection described in Section |5.2| can generate improved approximations to indi¬ 
vidual eigevectors if all the targeted eigenvalues Ai,...,Ag are simple and well separated. 
However, if an eigenvalue of interest is semi-simple, i.e., dim (nullT(Ai)) = g > 1 for some i, 
or if g eigenvalues are tightly clustered, then the suggested refinement scheme has difficulties 
computing the entire invariant subspace. In this situation, the Rayleigh-Ritz procedure gen¬ 


erates several (up to g) Ritz values that are very close to each other. The refinement step (5.5) 


with these Ritz values then delivers almost identical new eigenvector approximations, which 
leads to an inaccurate approximation to the complete eigenspace. 

In order to adapt the refined projection to the case of semi-simple or tightly clustered 
eigenvalues, we propose the following strategy. We first select the most promising Ritz values 
vi^,... and distribute them among K groups Gi, ... ,Gk in such a way that all the 
values in one group are very close to each other, whereas those belonging to different groups 
are relatively well-separated. Next, we presume that the Ritz values inside each group Gr 
that contains multiple elements converge to a numerically semi-simple eigenvalue, so that each 
Gt aims at revealing a distinct semi-simple eigenvalue. In this case, instead of computing 
individual refined eigenvector approximations for the Ritz values in Gt using (5.5), we extract 


an orthonormal basis that approximates the entire eigenspace associated with the targeted 
semi-simple eigenvalue. This is achieved by utilizing singular vectors corresponding to several 
smallest singular values of the reduced matrices T[v)Uk, where iz is a representative value for 
the Ritz values in the given group. 

More precisely, we take Gi as an example, and assume without loss of generality that it 
contains g > 1 tightly clustered Ritz values, i.e., Gi = {vh ,..., zz^g} for some 1 < g < g. We 
then find the right singular vectors yi,... ,yg corresponding to the g smallest singular values 
of the matrix T(zzfj)I/fe, and define the new iterates as = Ukyi, where 1 < z < g. The 

constructed vectors deliver an orthonormal basis that is expected to approximate the 

eigenspace of a semi-simple eigenvalue A « zz^^ « ... « zz^^. Note that gi, ..., can also be 
the singular vectors of any of the matrices T{i'£-)Uk, since the values zz^. are very close to each 
other by construction (1 < z < g). 

In order to assign the Ritz values to the groups Gi,..., Gk , an appropriate threshold 
needs be chosen to determine if several values are sufficiently close to be included into one 
group. An excessively small threshold could mistakenly treat a semi-simple eigenvalue as 
several well-separated simple eigenvalues and thus encounter the difficulty described above 
(fail to generate the complete eigenspace accurately), whereas an overly large threshold could 
incorrectly treat several distinct simple eigenvalues as a semi-simple one, resulting in inac¬ 
curate eigenvector approximations. For example, in our BPLMR implementation, the Ritz 
values vi-^,..., V£ are assigned to the same group if 


max 

i=l,...,g 


\ve, - vl 


< 10 “ 


where zz = 




(5.6) 


Whenever available, an a priori information on distribution of the desired eigenvalues can be 
exploited for a more flexible threshold estimation. 
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It is clear that, in practice, the Ritz values of group Gr can converge to multiple tightly 
clustered eigenvalues, which contradicts our assumption on the convergence to a single semi¬ 
simple eigenvalue. Nevertheless, the assumption turns out to be non-restrictive. In fact, a 
group of tightly clustered eigenvalues can be considered as those arising from a small pertur¬ 
bation imposed on a semi-simple eigenvalue. Consequently, the invariant subspace associated 
with this group comes from a slight perturbation of the eigenspace corresponding to this 
semi-simple eigenvalue. In this case, it is not necessary, and in fact impractical, to compute 
each individual eigenvector to very high accuracy. The orthonormal basis obtained from our 
proposed approach forms a good approximation to the eigenspace corresponding to the pre¬ 
sumably semi-simple eigenvalue, and thus it provides a good approximation to the invariant 
subspace for the clustered eigenvalues. If there is need to resolve each individual eigenpair in 
this clustered group to higher accuracy, we can set the tolerance described in (5.61 moderately 
smaller. However, our experience indicates that an excessively small tolerance tends to delay 
the convergence, if the desired eigenvalue is indeed semi-simple. 

We note that additional care needs to be taken in the refinement step to avoid repeated 
convergence. This is because such a refinement procedure is constructed independently for 
each numerically distinct Ritz value, and thus the singular vectors coming from two different 
residual minimization problems (5.51 tend to be numerically linearly dependent whenever 
two selected Ritz values and Vi- are close but not sufficiently close to be distributed 
into one group. To tackle this difficulty, for each candidate new eigenvector approximation 
^k+i ~ ^kVMR (1 < * ^ ?): ■we check if is greater than some threshold, where 

X stands for the space spanned by all previously selected new eigenvector approximations 


‘•fc-i-i’ ■ 


(j-i) 

<^ 1+1 


We accept such a candidate if this criterion is satisfied; otherwise, we choose 

the singular vector associated with the next smallest singular value and test this condition 

(i) 

again, until a linearly independent new eigenvector approximation is found. 


5.4. Deflation. Deflation plays a crucial role in simultaneous calculation of several 
eigenpairs. It allows eigensolvers to exclude the converged quantities from the computation 
and update only unconverged eigenvector approximations. It also ensures that no repeated 
convergence occurs. For linear eigenproblems, deflation is based on the eigen-decomposition 
(Hermitian case) or the Schur form (non-Hermitian case), and is usually fulfilled by orthogo- 
nalizing the search subspace against the converged invariant subspace. Such a deflation mecha¬ 
nism is often called “hard deflation” (or “hard locking”), as the converged eigenvectors are not 
explicitly included into the search subspace. For nonlinear eigenproblems T{X)v = 0, deflation 
is performed by working with invariant pairs directly using special variants of Newton-like 
methods 0, m, |20j . or using the inhnite Arnoldi method that allows for a Schur form on a 
transformed linear space m. [IS]- 

For nonlinear Hermitian eigenproblems T(X)v = 0 that satisfy the variational principle 
(Theorem |2.5[ ), deflation can be performed without explicitly preserving invariant pairs, since 
all eigenvectors are linearly independent. One would naturally wonder if hard deflation is 
possible, e.g., through orthogonalization based on the scalar-valued function [-,•] defined in 
). Unfortunately, this approach is not viable, as •] is not bilinear in general, and thus the 
Gram-Schmidt procedure does not work. Instead, we simply include the converged invariant 
subspace into the BPLMR search subspace generated by the unconverged eigenvectors, and, 
after performing the refined projection, update only the unconverged pairs. This strategy is 
usually called “soft deflation” (or “soft locking”). 

Specifically, assume that the first d columns of have converged. We can then dis¬ 
tinguish between the converged and unconverged columns. The former can be placed into 
the matrix ... x^if'^], whereas the latter are used to form the “active” block 


( 2 .. 
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Algorithm 2: The BPLMR(m) algorithm for a Hermitian eigenproblem T{X)v = 0 


Input: Initial eigenvector approximations Xq £ a preconditioner M, the 

shift (T £ R, and integers r, s > 0; 

Output: q eigenpairs where Ai’s are the eigenvalues of T{-) closest to ct; 


1 

2 

3 

4 

5 

6 : 

7: 


Set ^Xo,k^0,d 
Set ^ [] and Plf 

while convergence not reached do 


0, and compute diag(p(a;Q^*),... ,p{x\^'’)). 


where the action of 

act 


If fe > 0, then ^ 

Compute an orthonormal basis Uk of the search subspace ( |5.7| 
the preconditioner Mn is given by ( |5.3[ ) with Zk = Z^^^ = T (^vj. , -i-j. j. 

Solve the Rayleigh-Ritz projected eigenproblem (3.101 for all Ritz pairs. 

Restore the converged d Ritz pairs, then select the r unconverged Ritz values clos¬ 
est to o, and sort the corresponding Ritz pairs according to their eigenresidual 
norms (3.111. Then choose the s Ritz pairs with minimal eigenresidual, and take 

closest to a from the s candidates. 

,Gk, such that (5.6 1 is 


the q — d Ritz values 






8: Distribute the q — d Ritz values among K groups Gi, 

satisfied. That is, the values in the same group are tightly clustered, and those in 
different groups are well-separated. 

9: (a) For r = 1,2, ...,1^, let Gt be the current group containing gij) Ritz values, 

and Vi(t) be a Ritz value in Gt. Find the smallest singular values of T(r'£(T))Gi; and 


associated right singular vectors j/i, 2 / 2 , ... 

(b) For i = 1, 2,..., d -I- (m + 2){q — d), compute the candidate new eigenvector 
approximation UkVi, and accept it only if Z{Ukyi, X) > S, where X is spanned by 
all columns of Xk°Zi and all previous accepted new eigenvector approximations. 

(c) Once gr(r) new eigenvector approximations are obtained for group Gt, reorder 

all new eigenvector approximations such that \p{x^k+i^)~ ^ ^ '^l- 

Normalize each column such that = 1 for all d-|- 1 < i < g. 

Move to process the next group until all K groups are processed. 

10: Determine the number d of converged eigenvectors. Set ... ,a:fe+i], 

X^f, ^ • • -,4^1], and ^ diag(p(xitY’). • • • ,P( 4 ^i))- 

11: fc-fl. Ifd = g, then declare convergence. 

12: end while 

13: Set Xi ■«— Vi <— x4 ■ Return (Ai, Vi) for i = 1,..., g. 


vact _ 

— 1-^fc 


The deflated BPLMR subspace can then be defined as 


^BPLMRl™) ^ ,ange(A“™) + ICm+i (MiiT(., A^) + range (^^‘ 1 ) , (5-7) 


where = diag(p(x^‘^^^^),... and the block search direction is constructed 


vact 
■^k-l 


according to ( |5.4[ ) with and Xk-i replaced by and respectively. Here 

refers to the eigenvector approximations in iteration k—1 that correspond to the active set in 
the current iteration k. Similarly, the preconditioner 
<i)fc and Zk replaced by A^ 


Jin 


is constructed as in (5.2), with A^, 


and T'(A^'^‘, respectively. Following the convention, 

we denote an orthonormal basis of (5.7) by Uk, which contains d -I- (m -|- 2)(g — d) columns. 
Given Uk, we perform the Rayleigh-Ritz procedure and solve the projected eigenprob¬ 


lem (|3.10|) to obtain a set of the Ritz pairs. We then recover the d Ritz pairs that have previ- 

and 
to 


ously converged. This is done by checking if a Ritz pair {ly, z) has both mini<i<rf |iz — 


Z( 2 :,range(A™”’')) sufficiently small. Next, we apply the strategy discussed in Section |5. 2 
select q — d promising Ritz values from the remaining {m 


- 2)(g — d) Ritz pairs, and then use 
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the selected Ritz values as shifts for the refined projection. The refined projection should be 
performed as described in Section |5.3| to avoid repeated convergence. The entire scheme of 
BPLMR with deflation is summarized in Algorithm 2. 

5.5. Computing many successive eigenvalues. In this section, we discuss an exten¬ 
sion of the use of PLMR methods for the computation of many successive eigenvalues. Such a 
computation is crucial in a variety of important applications, for example, where a large num¬ 
ber of the lowest eigenvalues and corresponding eigenvectors are desired. Traditional PCG 
methods are generally most reliable in this setting, but they rely on the min-max property of 
eigenvalues and thus require complete deflation of all converged eigenvectors. Consequently, 
both the memory and arithmetic cost gradually become prohibitive as the number of desired 
eigenvalues, Ud, grows to a few hundred or above. In addition, for nonlinear Hermitian prob¬ 
lems, the rapid increase in arithmetic cost is even more dramatic as Ud grows, because soft 
deflation including all converged eigenvectors is needed for the Rayleigh-Ritz projection. As 
a result, solving a single projected eigenproblem becomes increasingly time-consuming. 

To tackle this issue, we need to perform partial deflation, instead of complete deflation, 
of converged eigenvectors. The motivation for partial deflation is that PLMR methods are 
designed to generate approximations to eigenvalues around the shift a, and thus deflation of 
the eigenvectors associated with eigenvalues far from a is not necessary since the algorithms 
would not converge to those eigenvalues anyway, provided that a good preconditioner M « 
T((t) is available. Consequently, only a partial deflation of eigenvectors corresponding to 
eigenvalues near cr is sufficient to avoid repeated convergence. 

In fact, the partial deflation strategy can be easily developed based on the soft deflation 
we studied. Specifically, note that we can use soft deflation to avoid repeated convergence to 
any previously found eigenvectors, so that additional desired eigenpairs can be computed in 
an incremental manner. Let W £ contain £ converged eigenvectors already obtained. 

To deflate these eigenvectors, BPLMR simply develops the search subspace 

range(IT) + range(Xr"") + /C„+i (MtjT(., + range (^^‘ 1 ) , 

and treats W the same way as Specifically, it performs the Rayleigh-Ritz projection 

and obtains the ^ + d converged Ritz pairs. It then finds the q — d most promising Ritz values 
from the unconverged Ritz pairs and uses them as the shifts for the refinement procedure. New 
eigenvector approximations are generated as usual from the singular vectors corresponding to 
the smallest singular values of relevant matrices, and each candidate U^y is accepted only if 
Z(Uky, X) is not very small, where X is the space spanned by the columns of W, A™"'", and 
all previously selected new eigenvector approximations in iteration k. 

With the above extension of soft deflation, we now propose the ‘moving-window’ style 
partial deflation for computing successive eigenvalues of T(-) on an interval (a,b) C M. We 
start BPLMR with the set of converged eigenvectors IT = 0 to compute the q eigenvalues 
A 14 ,..., closest to CTi = a, and set the columns of W be the corresponding eigenvectors 
vip,... ,vi^q. Then we choose a nearby shift a 2 > ui, and use BPLMR with W to find the 
q eigenvalues A 2 q,..., -^ 2 ,^ near CT 2 . The two sets of eigenvalues should have no intersection 
due to the use of deflation. Then the new set of eigenvectors V 2 ,i, ■ ■ ■ ,V 2 ,q are added to 
W, and we choose a new shift CT 3 > cr 2 and invoke BPLMR again. At a certain step, if the 
current shift Ci is far from cti, for example, we remove the first set of eigenvectors vip ,..., vi^q 
from W. We also update the preconditioner when necessary to maintain rapid convergence for 
eigenvalues near the new shift. The maximum window size, i.e., the largest number of columns 
of W allowed, is determined upon a trade-off between the storage cost and the occurrence of 
repeated convergence. 
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The described partial deflation strategy is critical for keeping the total computational cost 
roughly proportional to the total number, n^;, of desired eigenvalues. Recently, a strategy with 
similar motivation, called “local numbering of eigenvalues”, has been successfully used with 
a basic nonlinear Arnoldi method for computing many successive eigenvalues [^. As we shall 
see in Section 6 , our proposed approach is highly reliable and efficient in this problem setting. 


6. Numerical Experiments. We illustrate the performance of the PLMR methods 
on a few Hermitian eigenproblems satisfying the variational characterization (2.1 1 or (2.2 1 . 
We shall see that the new algorithms exhibit rapid and robust convergence towards interior 
eigenvalues, provided that good preconditioners are available. Unless otherwise noted, the 
experiments were performed on a Macbook computer running Mac OS X 10.7.5, MATLAB 
R2012b, with a 2.4 GHz Intel Core 2 Duo processor and 4GB 667MHz DDR2 memory. 


problem 

type 

order 

interval 

wiresaw 

quadratic 

1024 

(0, 3250) 

genhyper 

quadratic 

4096 

(-843,0.3943) 

sleeper 

quadratic 

16384 

(-16.33,-1.61) 

string 

rational 

10000 

(4.4,1.2 X 10®) 

pdde 

nonlinear 

39601 

(-20.87,4.08) 

artificial 

nonlinear 

16129 

(-0.43,3.34) 

Laplace2D 

linear 

10000 

(0,8) 

Laplace^D 

linear 

125000 

(0,12) 


Table 6.1 

Description of the test problems 


We choose eight Hermitian eigenproblems for the test. The six nonlinear eigenproblems 
have been introduced in part I of our study [32] , but we describe them here again to make this 
paper self-contained. Table |6.1| summarizes these problems, among which the quadratic and 
the rational eigenproblems are constructed from the NLEVP toolbox |3]. The first quadratic 
eigenproblem wiresaw of order 1024 comes from the vibration analysis of a wiresaw, con¬ 
structed by the command nlevpC'wiresawl ’, 1024). The eigenvalues of this gyroscopic 
eigenproblem are purely imaginary and thus do not satisfy the variational principle ( 2 . 1 ) or 
(2.2), but they can be mapped to real eigenvalues of a transformed Hermitian eigenproblem 
by substituting A with i\. The transformed problem has 1024 pairs of real eigenvalues {A^}, 
where X~ = —A^, and {A“} and {A^} lie in Ii = (—3250, 0) and A = (0, 3250), respectively. 
The variational principle (2.1) holds on I( and (2.2) on A, respectively, and we look for the 
eigenvalues on A. Next, the hyperbolic quadratic problem genhyper of order 4096 is con¬ 
structed by the command nlevp(‘genhyper ’ ,ev, [eye(4096) eye(4096)]), where ev is a 
vector whose entries are the reciprocals of 8192 random numbers generated by randn function 
initialized with a zero seed. The elements of ev are set to be the eigenvalues of this problem, 
4096 of which are distributed on the left interval le = (—843, 0.3943), and the rest lie in the 
right interval A = (0.3943,20061). The variational principle in is satisfied on A and \2.2\ 
on /f, respectively, and we aim at solving the eigenvalues on The third quadratic problem 
sleeper of the form T(A) = Aq + AAi -|- A^A 2 of order 16384 models the oscillations of a rail 
track lying on sleepers. The problem is constructed by the command nlevp (‘ sleeper ’, 128), 
and then the matrix corresponding to the constant term is changed from Aq to Ag — 21, so 
that the modified problem satisfies the variational principle (2.2) on (-16.33,-1.61). The 
rational eigenproblem string of the form T(A) = A — \B -|- of order 10000 is gener¬ 

ated by the command nlevp (‘string’ ,10000); it arises in the finite element discretization 
of a boundary problem describing the eigenvibration of a string attached to a spring. The 
variational principle (2.2) holds on the interval (4.4,1.2 x 10®). 

Two truly nonlinear eigenproblems are described as follows. The first arises from the 
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modeling of a partial delay differential equation (pdde) [15] Ut{x, t) = Au{x, t) + a{x)u{x, t) + 
b{x)u{x,t — 2) defined on fl = [ 0 , 7 r] x [ 0 , 7 r] for t > 0, where a{x) = 8 sin(xi) sin(a; 2 ) and 
b{x) = 100| sin(a:i + X 2 )|, with Dirichlet boundary condition u{x,t) = 0 for all x S dfl and 
t > 0 Assume that the solution is of the form of u{x, t) = e^*v{x). Using the standard 5-point 
stencil finite difference approximation to the Laplacian operator on a 200 x 200 uniform grid, 
we obtain an algebraic eigenproblem T(A) = XI + {M -|- A) -|- where the matrices 

M, A and B of order 39601 are the discretized form of the Laplacian operator, respectively. 
The variational principle (2.2) holds on the interval (—20.87,4.08). The second is an artificial 
problem of order 16129 of the form T(A) = — sin ^A-|-VA -I- lB + e~^/^C, where A = I, B = 
tridiag[l; —2; 1], and C comes from the standard 5-point stencil finite difference discretization 
of the Laplacian, based on a 128 x 128 uniform grid on the unit square, without scalin g by 
the mesh size factor ^ = 128^ as done for the pdde problem. The variational principle (2.2) 
holds on (—0.43,3.34). 

The eigenproblems Laplace2D and LaplaceiD arise from the standard 5-point and 7- 
point stencil finite difference discretization of the Laplacian with Dirichlet boundary condi¬ 
tions on the unit square and unit cube, using 100 x 100 and 50 x 50 x 50 uniform grids, re¬ 
spectively. Both are linear eigenproblems with the majority of eigenvalues being semi-simple. 
Although our focus is on nonlinear problems, we include these examples to demonstrate the 
BPLMR’s capability to resolve multiplicities. The matrices A are of order 10000 {Laplace2D) 
and 125000 [Laplace'XD) and are generated using the MATLAB function laplacian.m down¬ 
loaded from the MATLAB Central File Exchange, developed by A. Knyazev. 


6.1. PLMR vs. PLHR. In this section, we demonstrate that the PLMR’s symmetry¬ 
preserving extraction strategy based on the refined Rayleigh-Ritz is crucial for the eigen- 
solver’s robustness. In particular, we compare PLMR to its version where the refined Rayleigh- 
Ritz is replaced with the harmonic projection. To distinguish between the two schemes, 
we refer to the latter as the preconditioned locally harmonic residual (PLHR) algorithm. 
Note that the same name is used for an interior linear eigenvalue solver [35j . which only 
loosely relates to the approach considered here. The non-Hermitian nonlinear eigenproblem 
UlT{a)*T{u)Uky = 0, encountered by PLHR within the harmonic Rayleigh-Ritz procedure, 
is solved for the harmonic Ritz pair associated with the harmonic Ritz value closest to u, 
using the residual inverse iteration [T6| [22] . 

Table |6.2| summarizes the performance of the two methods for computing the eigenvalue 
closest to a given shift. Let us take the problem wiresaw for instance to explain the results. 
We choose the shift a = 800, and we use the incomplete LDL"'’ factorization of T{a) with 
drop tolerance = 0.8 as the preconditioner to compute the eigenvalue A = 801.026019. 
The algorithms are terminated once the relative eigenresidual ||rfe|| = \\T{tlk)\\F\\xk \\2 
computed eigenpair {fJ,k,Xk) satisfies ||rfe|| < Tg = 10“^°. Starting with the same random 
initial approximation xq, it takes 14 iterations for both PLMR(2) and PLHR(2) to find the 
desired eigenvalue. 


Problem 

a 

A 

Parameters 

PLMR(2) 

PLHR(2) 

wiresaw 

800 

801.026019 

Td= 0.8, Te = 10-^“ 

14 

14 

qenhyper 

-300 

-283.145566 

O 

1 

o 

II 

CO 

o' 

II 

6 

8 

sleeper 

-9.1 

-9.09813985 

Td = 0.005, Te = 10-10 

19 

19 

string 

4.9 x 10'^ 48974187.5 

Td = 0.06, Te = 10-12 

22 

22 

artificial 

0.2 

0.19999002 

Td = 0.01, Te = 10-10 

20 

CXO 

pdde 

0 

0.00149342689 = O.OOl.Te = 10-^° 14 

Table 6.2 

Comparison of PLMR(2) and PLHR(2) in iterations 

CXO 
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Table |6.2| shows that PLMR converges at least as rapidly as PLHR for the four initial 
tests, whereas PLHR fails to converge (marked as oo) for the two remaining problems. The 
convergence failure occurs because PLHR stagnates with an eigenvalue approximation of low 
accuracy (e.g., with an eigenresidual norm around 10“^) not very close to cr. This stagnation 
can be fixed, however, by choosing a new shift cr closer to the desired eigenvalue, or by using 
a stronger preconditioner. 

Thus, while the convergence rate of PLMR and its PLHR variant is similar, the former 
tends to be more robust with respect to the quality of preconditioners and the choice of the 
shift cr. In addition, the robustness of PLMR strongly relies on a properly defined eigenvector 
extraction procedure, based on the symmtery-preserving refined Rayleigh-Ritz approach. 


6.2. Effectiveness of the refinement procedure. In this section, we illustrate the 
importance of using the refined projection to stabilize the convergence of PLMR. We show 
that PLMR converges considerably more robustly than the version without the refinement 
step, which only discards spurious Ritz values and uses the strategy described in Section 3.3 
to choose a Ritz pair as the new eigepair approximation. 

To demonstrate the effect of the refined projection, we compare PLMR(2) with the sim¬ 
plified variants of PLMR(2) and PLMR(4) that do not invoke the refinement procedure. We 
run the three methods with the same random initial approximations, repeating the experi¬ 
ment 20 times, and show in Table 6.3 the number of times each method successfully finds the 
desired eigenvalue in 100 iterations and the average count of preconditioned matvecs needed 
for the successful runs. 


Problem 

a 

Parameters 

PLMR(2) 

PLMR(2) w/o 
refinement 

PLMR(4) w/o 
refinement 

wiresaw 

1000 

'^d 

II 

O 

1 

CO 

II 

o 

1 

o 

20/20 

12.3 

3/20 

95.0 

19/20 

30.7 

gen.hyper 

-200 

'^d 

II 

O 

II 

o 

1 

o 

20/20 

11.1 

6/20 

26.7 

10/20 

15.2 

loaded^string 

10® 

'Td 

= 10-3, Te = 5 X 10-13 

20/20 

6.0 

16/20 

19.3 

20/20 

8.0 

sleeper 

-2 

Td 

= 10-3, Te = 10-1° 

20/20 

11.3 

5/20 

41.6 

19/20 

58.9 

pdde 

-1 

'^d 

= lO-'l, Te = 10-1° 

18/20 

9.2 

9/20 

30.7 

19/20 

17.8 

artificial 

0.5 

'^d 

o 1 

7 

o 

II 

1 

o 

II 

20/20 

11.4 

5/20 

12.4 

17/20 

14.6 


Table 6.3 

Comparison of PLMR(2) with two variants that do not perform the refinement step: number of successful 
runs and average counts of preconditioned matvecs 


We see clearly from Table |6.3| that the refined projection is crucial for the stabilization 
of convergence for PLMR. For example, we look for the eigenvalue of the problem wiresaw 
closest to tr = 1000, using the incomplete LDL"'" preconditioner with drop tolerance = 10“^. 
The relative tolerance for the computed eigenpair is = 10“^°. PLMR(2) always managed 
to find the desired eigenvalue, and it took 12.3 preconditioned matvecs on average to achieve 
convergence. Without the refinement step, by contrast, this method only succeeded 3 times, 
and on average it took 95 preconditioned matvecs to converge. In addition, PLMR(4) without 
the refinement step converged 19 times, and it took an average 30.7 preconditioned matves to 
find the desired eigenpair. In fact, with only one exception (for the problem pdde), PLMR(2) 
exhibits more robust and rapid convergence than the two variants without the refinement 
step. Note that for pdde, PLMR(2) was only marginally less robust than PLMR(4) without 
refinement, but was still considerably more efficient than the latter. In fact, with a stronger 
incomplete LDL"'" preconditioner with drop tolerance 10“®, PLMR(2) managed to outperform 
the latter in both robustness and efficiency. 


^There is no need to construct an incomplete LDL"’’ preconditioner for genhyper because T(cr) is diagonal. 
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6.3. Order of local convergence. In Section we presented a local convergence 
analysis of PLMR(TOfc), showing that the new method could exhibit local linear, quadratic 
and cubic convergence if ruk is sufficiently large for each step k. Here, we provide some 
numerical evidence to support the analysis. We note that in general, perfect quadratic and 
cubic convergence are rarely observed in practice. In addition, it is impractical to choose the 
optimal rrik for each k to achieve the expected order of convergence. Instead, we simply let 
= /3mi,, where /3 is a small fixed integer, to illustrate that PLMR(mfc) can easily achieve 
superlinear and superquadratic convergence. 


wiresaw {a = 800) 

Td = 0.8, 7o = 0.42 

mfc = 2 mk+i = 2mk nik+i = Sm* 
0.96 2.06 2.43 

sleeper {a = —9.1) 

Td = 0.01, 70 = 0.15 

— 2 '}TT'k-\-l — 2772/;; 77ljs-\-i = StTI/^ 

0.98 1.72 2.51 

string {a = 4.9 x lO'^) 

Td = 0.06, 7o = 0.1 

nik =3 ruk+i = 2mk nik+i = Snik 

0.97 1.76 2.31 

artificial {a — 0.2) 

Td = 0.01, 7o = 0.2 

nik = 3 mk+i = 2mfe mk+i = dm* 
1.01 2.45 3.91 

pdde {a = 0) 

Td = 0.001, 70 = 0.25 

nik =3 nik+i = 2mk nik+i = 3mk 
0.98 1.78 2.26 


Table 6.4 

Estimate of the order of local convergence for PLMRfmj,) 


Table 6.4 gives the estimates of the order of local convergence of PLMR(TOfc) for five test 
problems. Let us again take the problem wiresaw as an example to interpret the results. We 
are looking for the eigenvalue closest to tr = 800, and we use the incomplete LDL"”" factoriza¬ 
tion of T{a) with drop tolerance = 0.8. The initial iterate is set as xq = ri -f 7 ou, where 
V is the desired unit eigenvector, 70 = 0.42, and u is a fixed unit vector generated by MAT- 
lab’s randn. We let nik = 2 and run twenty PLMR(mfc) steps. Then we record the relative 
eigenresidual ||rfc|| = for each k and generate points (log ||rfe||,log ||rfc+i||), for 

which we find the corresponding linear least squares fitting y = ax + b. The estimated order 
of convergence is the slope a = 0.96 of the linear fit. Next, we let mg = 2, m/j+i = 2mfc, and 
run three PLMR(mfe) steps. Using the same approach, we obtain an estimated convergence 
of order 2.06. Finally, we let toq = 2, mk+i = Sm/j, and we run two PLMR(mfc) steps to get 
the convergence order of 2.43. For all problems, we run 20, 3 and 2 steps, respectively, to 
capture linear, quadratic and cubic convergence. 

Our results show clearly that PLMR(mfe) with a small fixed ruk converges linearly, and 
it converges superlinearly and superquadratically with nik+i = and mk+i = 3rnk (or 
TO/c+i = 4mfe), respectively. In fact, higher order of local convergence is observed for the 
problem artificial. We note, however, the efficiency of PLMR(mfe) primarily depends on the 
total number of preconditioned matvecs, instead of the order of local convergence. Higher 
order of convergence is achieved with increasingly larger value of rrik as the method proceeds. 
Overall, our experience is that the total number of preconditioned matves needed to achieve 
a certain level of eigenresidual tolerance largely depends on the quality of the preconditioner, 
and it is relatively insensitive to the order of local convergence. 


6.4. PLMR vs. JD. We now compare PLMR(to) with the JD methods where the right- 
preconditioned GMRES(to) is used as an inner solver (further referred to as JD-GMRES(m)). 
Our numerical results show that PLMR(m) is at least as efficient as, and is often superior to, 
JD-GMRES(m), especially when working with search subspaces of a modest dimension. 

In Section]^ we studied a close connection between PLMR(m) and the basic variant of 
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wiresaw 



Td = 0.05 

10 

0 

II 

00 

0 

II 

a = 800 


PLMR 

6 12 14.05s (2) 

6 24 18.22s (4) 

4 28 19.66s (7) 

Te = 10 -^^ 70 

= 0.25 

JD-GMRES 

6 18 20.84s (2) 

4 28 22.31s ( 6 ) 

4 36 23.42s ( 8 ) 

genhyper 



Td = — 



a = -300 


PLMR 

6 12 0 . 22 s ( 2 ) 



0 

1 

0 

II 

= 0.04 

JD-GMRES 

5 20 0.14s (3) 



sleeper 



Td = 0.002 

0 

0 

II 

Td = 0.02 

a = -9.1 


PLMR 

4 16 1.07s (4) 

3 39 1.81s (13) 

3 66 3.98s (22) 

Te = 10-1", 70 

= 0.1 

JD-GMRES 

4 20 0.78s (4) 

3 42 1.09s (13) 

4 72 2.26s (17) 

string 



Td = 0.01 

Td = 0.06 

Td = 0.1 

(7 = 4.9 X 10 '^ 


PLMR 

4 8 0.82s (2) 

12 36 2.96s (3) 

8 72 4.73s (9) 

Te = 10-11, 70 

= 0.1 

JD-GMRES 

5 15 0.98s (2) 

7 56 2.48s (7) 

5 70 2.66s (13) 

artificial 



Td = 0.002 

0 

0 

II 

Td = 0.02 

a = 0.2 


PLMR 

7 28 4.43s (4) 

6 54 6.33s (9) 

7 343 48.67s (49) 

II 

0 

1 

to 

0 

= 0.1 

JD-GMRES 

9 45 5.88s (4) 

7 56 5.50s (7) 

8 385 32.68s (54) 

pdde 



Td = 0.0002 

Td = 0.001 

Td = 0.0016 

CT = 0 


PLMR 

3 24 6.99s ( 8 ) 

3 30 8.25s (10) 

4 44 11.81s (11) 

Te = 10-1", 70 

= 0.1 

JD-GMRES 

5 30 7.94s (5) 

3 39 8.41s (12) 

3 54 10.69s (17) 


Table 6.5 

Comparison of PLMR{m) and basic JD-GMRES(m) in iterations, preconditioned matvecs, CPU time 
and optimal values of m (in parenthesis) 


the JD-GMRES(m) algorithm. It is now of interest to compare the two methods, both in 
terms of local and global convergence. 

As in the previous section, we choose a shift a for each test problem, construct the cor¬ 
responding incomplete LDL"’’ preconditioner with certain drop tolerance, and run PLMR(to) 
and basic JD-GMRES(to) with the same initial iterate xq to find the eigenvalue around the 
given shift. With a randomly generated xo, PLMR(m) with a sufficiently large m always 
converges to the desired eigenvalue closest to a, whereas basic JD-GMRES(to) always mis- 
converges to a different eigenvalue. This is what we expected, as basic JD without subspace 
expansion has poor global convergence, unless a fixed shift is used in the correction equation 
for sufficiently many steps before a good eigenvector approximation can be obtained. The 
PLMR(m) method, in contrast, consistently exhibits a robust global convergence. 

Next, let us compare the two algorithms with optimal values of m in local convergence. 
To construct the initial iterate, for both methods, we let xq = v + 70 M, where v is the desired 
unit eigenvector, it is a fixed unit perturbation vector generated by random function, and 70 
is a small scalar representing the error of xq- 

Take the problem wiresaw with shift a = 800 as an example. The preconditioner used is 
the incomplete LDL"'’ factorization of T{a) with drop tolerance Td = 0.05, 0.5 and 0.8, respec¬ 
tively. The initial iterate is constructed as a:o = it -|- ygit, where 70 = 0.25. Table 6.5 shows 
that with Td = 0.05, the optimal value m for both PLMR(m) and JD-GMRES(m) is m = 2 (in 
parenthesis), since it leads to the smallest total number of preconditioned matvecs. PLMR(2) 
converges in 6 steps, taking 12 preconditioned matvecs and 14.05 seconds, to achieve the 
relative tolerance \\T{p^)\lp^xl \\2 — “ 10“^^. JD-GMRES(2) also converges in 6 iterations, 

taking 18 preconditioned matvecs and 20.84 seconds. The results of the algorithms for higher 
drop tolerances of incomplete LDL"^ preconditioners are obtained similarly. 

We see the following patterns in the performance comparison for local convergence. 

1. PLMR(m) outperforms the basic JD-GMRES(m) in the total number of precondi¬ 
tioned matvecs in essentially all circumstances. This is partially due to the fact that 
the former and the latter take m and m -I- 1 preconditioned matvecs, respectively, in 
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each iteration step. Consequently, PLMR(m) also tends to perform better in CPU 
time if the preconditioned matvec is expensive. This is an advantage of PLMR over 
other types of preconditioned eigensolvers, such as the nonlinear Arnoldi method, 
which converges considerably slower than JD if the preconditioner is weak m, m- 
2. As the quality of preconditioners deteriorates, the optimal values of m for PLMR(to) 
and basic JD-GMRES(m) increase, and the latter tends to take less CPU time. This 
is natural, because as m increases, the cost of other computational components in 
PLMR(m), such as the refined Rayleigh-Ritz projection, becomes more pronounced. 
Such algorithmic components are intrinsically more expensive than the linear solver 
GMRES(m) for large m. In this case, the superiority of PLMR might be retrieved by 
replacing the weak preconditioned operation M~^ « T{a)~^ by a stronger one, e.g., 
an approximate linear solve with the coefficient matrix T(a) to a reasonably small 
tolerance, e.g., 10“^ to 10“®. 


In the following set of tests, we show that PLMR(m) is also more efficient than the 
full-featured JD-GMRES(m) method with a search subspace of variable dimension for the 
Rayleigh-Ritz projection (also referred to as full JD with subspace acceleration). Specifically, 
we compare PLMR(5) with JD-GMRES(5) working with a search subspace of dimension 5, 10 
and 20 (denoted as JD-GMRES(5)-1-RR(5), etc.), respectively, for computing 10 eigenvalues 
closest to a. Our implementation of JD is based on that described in [^, where the only 
difference is that our correction equation is formulated as in (3.3) and (3.4), using an identical 
projector for both algorithms. We let both methods start with the same random initial 
approximation xq, repeat the experiment 10 times, and take the average of the number of 
preconditioned matvecs. The parameters used to run the tests, together with results, are 
summarized in Table [6]6l 



PLMR(5) 

JD-GMRES(5) 

-I- RR(5) 

JD-GMRES(5) 

-1- RR(IO) 

JD-GMRES(5) 

-1 RR(20) 

wiresaw {a = 

Te = 10“^°, Td 

1000) 

= 10-° 

255 

1375 

484 

248 

genhyper {a = —200) 

Te = 10-1°, Td = 0 

628 

1581 

636 

344 

sleeper {a = 
Te = 10-1°, Td 

= -2) 

= 10-° 

265 

2281 

775 

396 

string {a = lO'’) 

Te = 5 X 10-l^ Td = 10-° 

273 

812 

371 

191 

pdde {a = 
Te = 10-1°, Td 

-1) 

= lO-'i 

172 

495 

165 

148 

artificial (a 
Te = 10-1°, Td 

= 0.5) 

= lO-"! 

181 

325 

174 

110 


Table 6.6 

Comparison of PLMR(5) and JD-GMRES(5) -I- Rayleigh-Ritz.- preconditioned matvecs counts for 
computing 10 eigenvalues around cr 


We see from Table 6.6 that PLMR(5) is considerably more efficient than JD-GMRES(5) 
-l-RR(5), and is essentially at least as efficient as JD-GMRES(5)-fRR(10). We note that 
PLMR(5) uses a search subspace of dimension 5, whereas the two variants of JD-GMRES(5), 
respectively, work with search subspaces of total dimension 5 -I- 5 = 10 and 5 -f 10 = 15, for 
the inner GMRES and outer JD iterations. 

We also tested other small values of m for the two algorithms and found similar patterns in 
performance. The robust convergence of PLMR is ensured by the refined projection, whereas 
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such a robustness of JD can only be achieved by the use of a large search subspace. For 
example, as can be seen in Table [63} the JD methods require three to five times more storage 
to become comptetive to PLMR(to). Thus, PLMR(m) is more efficient in both arithmetic 
and storage cost when a search subspace is of a modest dimension. 

6.5. PLMR vs. BPLMR. In this section, we perform some tests to show that BPLMR 
is generally more competitive than PLMR in arithmetic cost for solving a group of clustered 
eigenvalues. Such a conclusion has been well established for linear eigenproblems, but to the 
best of our knowledge, this is the first time it is done in a nonlinear setting. 


wiresaw {a 

= 800) 

PLMR(2) 

172 

344 

324.56s 

CO 

1 

o 

II 

= 10-1° 

BPLMR(2) 

11 

150 

37.50s 

genhyper (a = 

= -300) 

PLMR(2) 

155 

310 

5.74s 

II 

o' 

II 

10-10 

BPLMR(2) 

18 

178 

3.32s 

sleeper (cr = 

:-9.1) 

PLMR(2) 

117 

234 

22.68s 

II 

o 

1 

CO 

= 10-1° 

BPLMR(2) 

10 

150 

14.85s 

string (cr = 4.9 x 10^) 

PLMR(2) 

115 

230 

30.68s 

II 

o 

1 

CO 

= 10-1° 

BPLMR(2) 

9 

138 

13.61s 

artificial {a 

= 0.2) 

PLMR(2) 



OO 

1 

o 

II 

= 10-1° 

BPLMR(2) 

6 

50 

8.02s 

pdde (cr = 

= 0) 

PLMR(2) 

36 

72 

34.36s 

1 

o 

II 

= 10-1° 

BPLMR(2) 

16 

76 

31.64s 


Table 6.7 

Comparison of PLMR(2) and BPLMR(2) for computing 5 eigenvalues around the shift cr, in iterations, 
preconditioned matvecs and CPU time 


We compare PLMR(2) and BPLMR(2) with the same random initial approximations to 
compute five eigenvalues around the shift a. The drop tolerances Td of incomplete LDL"'" 
preconditioner and eigenresidual tolerances Te are also given. Recall that both algorithms 
use soft deflation of converged eigenpairs, and PLMR computes the eigenvalues sequentially 
whereas BPLMR generates the desired approximations simultaneously. 

Table 6.7 shows that BPLMR(2) is more efficient than PLMR(2) in arithmetic cost and 
CPU time for most problems. For the problem wiresaw, for instance, it takes PLMR(2) 172 
iterations, equivalently 344 preconditioned matvecs, and 324.56 seconds, to find the desired 5 
eigenpairs. In contrast, it takes BPLMR(2) 11 iterations, or equivalently 150 preconditioned 
matvecs, and only 37.50 seconds to converge. The performance difference for the two methods 
is minimal for the problem pdde. For the problem artifical, PLMR(2) failed to find the fourth 
eigenvalue around a in 500 iterations, but BPLMR(2) managed to find all the five eigenvalues. 
Clearly, for the same m, the block method is preferable in arithmetic efficiency, as long as 
sufficient memory is available for the larger search subspace it develops. 


6.6. Computing many successive eigenvalues. To verify the reliability of the new 
deflation techniques, we use BPLMR with the moving-window-style partial deflation described 
in Section 5.5, to compute a large number of extreme eigenvalues. We then compare the results 
with those obtained by PCG methods, which are most reliable in this setting. 

Table |6.8| summarizes the performance of LOBPCG and BPLMR for computing a few 
hundred or more extreme eigenvalues of the eight test problems. We take the problem string 
as an example to explain the results. The Ud = 400 lowest (L) eigenpairs of this problem 
are computed to the relative tolerance ||t(v^||f ||^^||2 — 10“^° (1 < * < 400). Block methods 
are used to find these eigenvalues sequentially, 10 eigenvalues each group, from the lowest to 
the highest ones. For each group of 10 eigenvalues, the block size is set to be 12 (slightly 
greater than 10) to stabilize the convergence. Once one group of eigenvalues are found, we 
compute the midpoint a = -^"-+^+1 between the two rightmost distinct computed eigenvalues 
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Xr and Ar+i, and let the new preconditioner be the LDL"^ factorization of T{a). Such a 
preconditioner is expected to accelerate convergence towards subsequent eigenvalues near a. 


problem 

Ud 

group 

size 

block 

size 

LOBPCG 
precond CPU 

matvecs time 

window 

size 

LOBPCG+BPLMR(2) 
precond CPU missed 

matvecs time 

repeated 

wiresaw 

500 (L) 

10 

12 

6010 

6566 

4 

7916 

3704 

0 

0 

Qenhvver 

500(H) 

10 

12 

8657 

7623 

4 

9028 

359 

0 

0 

string 

400 (L) 

10 

12 

5095 

49501 

4 

5843 

1434 

0 

0 

artificial 

500 (H) 

10 

12 

5337 

77905 

3 

7317 

2300 

1 

0 

pdde 

400 (H) 

5 

8 

4973 

61155 

4 

6457 

3284 

1 

0 

sleeper 

501 (L) 

10 

12 

5239 

9452 

4 

5804 

1234 

0 

0 

Laplace2d 

2001(L) 

10 

12 

- 

- 

5 

63749 

13269 

1 

0 

Laplace3d 

10002 (L) 

16 

20 

- 

- 

6 

199606 

369806'* 

3 

7 


Table 6.8 

Comparison of LOBPCG and BPLMR for computing successive eigenvalues: preconditioned matvecs 
and CPU time {in secs). Eigrenresidual tolerance is 10“'^^ for problem ‘string’ and for others. 


To illustrate the performance of our new method, we use LOBPCG to find the lowest 
four (window size) blocks of eigenvalues, and then run BPLMR(2) with the moving-window- 
style partial deflation of the most recently converged window-sized blocks of eigenvalues. 
This approach is compared with LOBPCG alone for computing all desired eigenvalues. As 
Table [6?8| shows, it takes LOBPCG 5095 preconditioned matvecs, and 49501 seconds to find 
the 400 lowest eigenvalues, whereas it takes the new method 5843 preconditioned matvecs, 
and only 1434 seconds. Our new approach not only avoided repeated convergence by partial 
deflation, but also did not miss any eigenvalue for this problem. The CPU time of the new 
method is lower, because it uses partial deflation, whereas the orthogonalization costs needed 
for complete deflation are the bottleneck in LOBPCG, despite that the latter has lower matvec 
counts. Similarly for the problem artificial, the highest (H) 500 eigenvalues are computed. 
It takes LOBPCG and the new method 5337 and 7317 preconditioned matvecs, and 77905 
and 2300 seconds, respectively. Only 1 eigenvalue is missed by BPLMR, and no repeated 
convergence occurs. We note that even block PCG methods could occasionally miss a few 
extreme eigenvalues of linear Hermitian eigenproblems [1] . 

As we can see, our new approach is essentially as reliable as PGG methods for computing 
extreme eigenvalues, but is significantly less expensive when a large number ng of eigenvalues 
are desired. The more eigenvalues are needed, the more advantage our method has over PGG 
methods. We emphasize that our test is simply an illustration of the reliability and efficiency 
of the new algorithm in the setting of computing all successive eigenvalues on a real interval. 
This method can be used to find many successive interior eigenvalues as well. 

Table |6.8| also shows that BPLMR is quite reliable to find semi-simple eigenvalues with 
correct multiplicities. As usual, we count a distinct eigenvalue with multiplicity ^ as ^ eigen¬ 
values. The last three problems in the table, namely, sleeper, Laplace2d and LaplaceUd all 
have a dominant majority of semi-simple eigenvalues. Specifically, only the lowest and the 
highest eigenvalues of sleeper are simple, and the rest are semi-simple with multiplicity 2. 
We see that BPLMR finds all the lowest 501 eigenvalues with correct multiplicities. For 
Laplace2d, among the lowest 2001 eigenvalues, 35 are simple and others are semi-simple with 
multiplicity 2. BPLMR obtains all these eigenvalues with correct multiplicities, with the only 
exception that one semi-simple eigenvalue is found with an incorrectly lowered multiplicity 1. 
Laplaceid is the most challenging problem, as only 15 eigenvalues among the lowest 10002 
ones are simple, and there are 379 distinct semi-simple eigenvalues with multiplicity 3, 1391 


^Performed on an iMac desktop computer running Mac OS X 10.8.5, MATLAB R2012b, with a 2.9 GHz 
Intel Core i5 processor and 16GB 1600MHz DDRS memory. 
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Problem sleeper 



Problem genhyper 



Problem Laplace3d 


Fig. 6.1. Total number of preconditioned matrix-vector products and CPU time used by LOBPCG and 
LOBPCG+BPLMR(2). Top: sleeper, lowest bOO eigenvalues Medium: genhyper, highest ^0^ eigenvalues 
Bottom: LaplaceSd, lowest 10002 eigenvalues {only LOBPGG+BPLMR(2) is used) 


with multiplicity 6, and 42 with multiplicity 12. BPLMR finds a vast majority of these eigen¬ 
values correctly. It misses 3 eigenvalues, and it converges repeatedly to 7 eigenvalues because 
those eigenvalues already moved out of the window. Using a larger window size will reduce 
the occurrence of repeated convergence. 

In terms of efficiency, the most remarkable pattern we see from Table |6.8| is as follows. 
LOBPCG based on the optimization of Rayleigh functional values always converges in fewer 
iterations than BPLMR, but the latter is significantly less expensive in arithmetic cost and 
thus takes much less CPU time if many (a few hundred or more) eigenvalues are desired. This 
observation is also clearly illustrated in Figure 6.1 for problems sleeper and genhyper as an 
example. As we explained, this is because BPLMR uses partial deflation, instead of the highly 
expensive complete deflation as LOBPCG does. Moreover, BPLMR based on partial deflation 
only needs a fixed amount of memory that depends on the block size, the search subspace 
dimension and the window size, but not on the total number of desired eigenvalues Ud- The 
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converged eigenvectors that have moved out of the window can be put on external storage 
because they will not be involved in subsequent computation of new eigenvalues. We see from 
Table |6.8| and Figure |6.1| that BPLMR with the moving-window-style deflation strategy is 
highly competitive if a large number of successive eigenvalues are desired. 

7. Conclusion. We have developed a Preconditioned Locally Minimal Residual (PLMR) 
method for computing interior eigenvalues of nonlinear Hermitian eigenproblems T{X)v = 0 
that admit a variational characterization of eigenvalues. We discussed the construction of 
the search subspace, stabilization of preconditioning, subspace projection and extraction, 
deflation, local convergence, and the extension to block variants. Our new algorithms are 
competitive in the rate and the robustness of convergence toward desired interior eigenvalues 
near a given shift. We also proposed a moving-window-style partial deflation strategy that 
enables BPLMR to compute a large number of successive eigenvalues. Numerical experiments 
show that the new approach is reliable, and is dramatically more efficient than PCG methods 
for computing many extreme eigenvalues. 
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